/* Replication materials for:
Transgender and Gender Diverse People Disproportionately Report Problems while
Trying to Vote Than Cisgender People
*/

**** In-text results first:

*Load replication data: (assuming working directory is already set to where the downloaded data file is)
use "pooled_cces.dta", clear

* Number and percent transgender or gender diverse:
tab new_trans

* Table 1 
*new_trans (0 = cisgender) (1 = transgender/gender diverse)
*prob_any (0 = No problems) (1= Problems)
*prob_type (1 = Problem with ID/Reg) (2 = Other Problem) (3  = No problem)
tab prob_any new_trans, col chi 
tab prob_type new_trans, col chi
*chi2 statistic is in scientific notation, so report the specific values:
di `r(chi2)'

* Odds and relative risk ratios and associated 95% CIs (reported in-text)
logistic prob_any i.new_trans
mlogit prob_type i.new_trans, base(3) rrr

* Table 2: Regression results & predicted probabilities with CIs (reported in-text)
*ALSO Table S3 (full results)
* Model 1
logit prob_any i.new_trans i.year 

* Model 2
logit prob_any i.new_trans log_age i.gender i.race_2 i.educ_2 i.year 

* Probabilities from Model 2 (reported in-text - rounded)
margins new_trans

* Model 3
melogit prob_any i.new_trans##c.voter_id  log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: new_trans

* Model 4
melogit prob_any i.new_trans##c.id_doc log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: new_trans

* Model 5
melogit prob_any i.new_trans##c.voter_id##c.id_doc log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: new_trans


* Figure 1:
** Computationally intensive - "force" option is used to speed up
** Predicted probabilities: note these estimates are stored in a separate Excel file
** R-file provides code to reproduce figure
*** Fit model 5 first then run the following code:
margins new_trans, at(voter_id==(1(.2)5) id_doc==(1 6)) force

** Categorical model - for Low Medium High interactions:
melogit prob_any i.new_trans##i.cat_id_doc##i.cat_vid  log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: new_trans

margins new_trans#cat_id_doc#cat_vid, force


* Table 3:
** Compuationally intensive
** Fit model 5 first then run the following code:
margins 1.new_trans, at(voter_id==(1 5) id_doc==(1 6)) force post
*output should be
/*
Row 1, Column 1  1 1  |   .0765315   .0109252     7.01   0.000     .0551185    .0979444
Row 2, Column 1  2 1  |   .0384461   .0216425     1.78   0.076    -.0039724    .0808646
Row 1, Column 2  3 1  |   .0322217   .0135688     2.37   0.018     .0056274     .058816
Row 2, Column 2  4 1  |   .0955648   .0239805     3.99   0.000     .0485639    .1425657
*/

* linear combinations hypothesis tests:
* Row 1, Column 3:
lincom -1*(3._at#1.new_trans - 1._at#1.new_trans)
* Row 2, Column 3:
lincom -1*(4._at#1.new_trans - 2._at#1.new_trans)
* Row 3, Column 1:
lincom 1._at#1.new_trans - 2._at#1.new_trans
* Row 3, Column 2:
lincom 3._at#1.new_trans - 4._at#1.new_trans

/* Satisfied all results in the main papaer */

***** APPENDICES

* Appendix 2:

** Table S2:
sum prob_any new_trans age female ibn.race_2 ibn.educ_2 ibn.year voter_id id_doc pct_trump

* Appendix 3:

** Table S3 (already reported above)

** Table S4:

*** Model 6:
melogit prob_any i.new_trans##i.cat_vid  log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: new_trans

*** Model 7:
melogit prob_any i.new_trans##i.cat_id_doc  log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: new_trans

*** Model 8:
melogit prob_any i.new_trans##i.cat_id_doc##i.cat_vid  log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: new_trans


* Appendix 4:

* R-file contains sensitivity analysis

* Appendix 5:

* R-file contain nonparametric bounds

* R-file contains simulation for S4

* R took a long time for complex simulations, so we used Stata

** Simluations for S.5
****simulation methods for the ME logit

program define my_reg
	args eta
	drop newtreat
	gen newtreat = 0 if new_trans==0
	replace newtreat = rbinomial(1, 1-`eta') if new_trans==1
	melogit prob_any i.newtreat##c.id_doc##c.voter_id log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: newtreat
	margins newtreat, at(voter_id==(1 5) id_doc==(1 6)) nose noestimcheck post
end

forvalues i = .2(.3).8 {
clear
use "pooled_cces.dta", clear
gen newtreat = .
local j = `i'*10
simulate _b, reps(100) seed(585`j') saving(modpred_`j'): my_reg `i'
}

*** will store multiple data files for predicted probabilities (uploaded)
*** Because this is a simulation, seed is set in the code.
*** these are in their own .dta files, and the R-file creates the figure
*replaces current dataset with sims = each column is an estimate from margins;
*each row is the simulation
*starting with 100 reps
*by multiplying j by another factor to change the seed & names - then can rbind().

* Appendix 6:

***requires loading the data from Prolific
use "prolific_aux.dta", clear

*N = 400

* samp (401 = "Cisgender Study") (402 = "Transgender Study")
* Age summary (in-text):
tabstat age, by(samp) s(mean sd min max)

* Race distribution (in-text):
tab race samp, col

** Table S6:
* trans_two: two-step
* ces: CES question
tab ces trans_two, col
* trans_q9: indicated trans with write-in responses:
tab ces trans_q9, col
* by sample type (Prolific defined):
tab ces samp, col
* by gender identity (Prolific defined): (ordering differs from the table)
tab ces gend_prol, col

** voting and problem (in-text)
tab Q6 samp, col

*** problems: prob_idreg (0 = No; 1= yes)
tab prob_idreg samp, col

* Appendix 7: 

*** back to CES pooled sample:
use "pooled_cces.dta", clear

** Table S7

*** Model 9
logit prob_any i.new_trans polknow log_age i.gender i.race_2 i.educ_2 i.year 

*** Model 10
logit prob_any i.new_trans log_age i.gender i.race_2 i.educ_2 i.year if polknow<3

*** Model 11
logit prob_any i.new_trans log_age i.gender i.race_2 i.educ_2 i.year if polknow>=3

*** Model 12
etregress prob_any polknow log_age i.gender i.race_2 i.educ_2 i.year, treat(new_trans = polknow log_age i.gender i.race_2 i.educ_2 i.year)

** Table S8

*** Model 13
melogit prob_any i.new_trans##c.voter_id##c.id_doc log_age i.gender i.race_2 i.educ_2 i.year pct_trump if polknow<3 || st_num: new_trans 
**** probabilties for Figure S6: also stored in a separate Excel file
**** code for figures are in the R-file
margins new_trans, at(voter_id==(1(.2)5) id_doc==(1 6)) force

*** Model 14
melogit prob_any i.new_trans##c.voter_id##c.id_doc log_age i.gender i.race_2 i.educ_2 i.year pct_trump if polknow>=3 || st_num: new_trans 
**** probabilities for Figure S6: (not the SEs):
margins new_trans, at(voter_id==(1(1)5) id_doc==(1 6)) force nose

** SE estimation was computationally intensive - so I bootstrapped the esimtation:
** Only use the boostraps to store the SEs (use direct point estimates)
** All estimates are stored in a separate Excel file
** Figure created with code in the R-file

program define my_reg2
	melogit prob_any i.new_trans##c.id_doc##c.voter_id log_age i.gender i.race_2 i.educ_2 i.year pct_trump || st_num: new_trans
	margins new_trans, at(voter_id==(1(1)5) id_doc==(1 6)) nose noestimcheck post
end

bootstrap _b, saving(bs_highknow) reps(500) seed(54511): my_reg2

** Table S9

*** Model 15
logit prob_any i.new_trans i.year if age<=29

*** Model 16
logit prob_any i.new_trans i.gender i.race_2 i.educ_2 i.year if age<=29

*** Model 17
melogit prob_any i.new_trans##c.voter_id  i.gender i.race_2 i.educ_2 i.year pct_trump if age<=29 || st_num: new_trans

*** Model 18
melogit prob_any i.new_trans##c.id_doc i.gender i.race_2 i.educ_2 i.year pct_trump if age<=29 || st_num: new_trans

*** Model 19
melogit prob_any i.new_trans##c.voter_id##c.id_doc i.gender i.race_2 i.educ_2 i.year pct_trump if age<=29 || st_num: new_trans

* probabilities for Figure S7 (nose) used becuase of complexity:
margins new_trans, at(voter_id==(1(1)5) id_doc==(1 6)) nose

* bootrstrap SEs as before:
program define my_reg4
	melogit prob_any i.new_trans##c.id_doc##c.voter_id i.gender i.race_2 i.educ_2 i.year pct_trump if age<=29 || st_num: new_trans
	margins new_trans, at(voter_id==(1(1)5) id_doc==(1 6)) nose noestimcheck post
end

bootstrap _b, reps(500) seed(4891): my_reg4

* again estimates are stored in separate Excel file
* figure is created by the R-file code

** Table S10

*** Model 20
logit prob_any i.new_trans i.year if validated==1

*** Model 21
logit prob_any i.new_trans log_age i.gender i.race_2 i.educ_2 i.year if validated==1

*** Model 22
melogit prob_any i.new_trans##c.voter_id log_age i.gender i.race_2 i.educ_2 i.year pct_trump if validated==1 || st_num: new_trans

*** Model 23
melogit prob_any i.new_trans##c.id_doc log_age i.gender i.race_2 i.educ_2 i.year pct_trump if validated==1 || st_num: new_trans

*** Model 24
melogit prob_any i.new_trans##c.voter_id##c.id_doc log_age i.gender i.race_2 i.educ_2 i.year pct_trump if validated==1 || st_num: new_trans

* probabilities for Figure S8
margins new_trans, at(voter_id==(1(1)5) id_doc==(1 6)) nose

* Bootstrap SEs for Figure S8
program define my_reg5
	melogit prob_any i.new_trans##c.id_doc##c.voter_id log_age i.gender i.race_2 i.educ_2 i.year pct_trump if validated==1 || st_num: new_trans
	margins new_trans, at(voter_id==(1(1)5) id_doc==(1 6)) nose noestimcheck post
end

bootstrap _b, reps(500) seed(6648): my_reg5

* again estimates are stored in separate Excel file
* figure is created by the R-file code

** Table S11

*** Model 25
logit prob_any i.new_trans i.year if validated==.

*** Model 26
logit prob_any i.new_trans log_age i.gender i.race_2 i.educ_2 i.year if validated==.

*** Model 27
melogit prob_any i.new_trans##c.voter_id log_age i.gender i.race_2 i.educ_2 i.year pct_trump if validated==. || st_num: new_trans

*** Model 28
melogit prob_any i.new_trans##c.id_doc log_age i.gender i.race_2 i.educ_2 i.year pct_trump if validated==. || st_num: new_trans

*** Model 29
melogit prob_any i.new_trans##c.voter_id##c.id_doc log_age i.gender i.race_2 i.educ_2 i.year pct_trump if validated==. || st_num: new_trans

* probabilities and SEs/CIs for Figure S9
margins new_trans, at(voter_id==(1(1)5) id_doc==(1 6)) 

* again estimates are stored in separate Excel file
* figure is created by the R-file code

* Appendix S9

** Table S12, CES 2016-2020 columns:
sum prob_any age ibn.race_2 ibn.educ_2 if new_trans==1

use "ces_2022.dta", clear

** in-text findings:
tab transgender
* Transgender and Nonbinary indicator (1 = TNB; 0 = Cis)
tab new_trans

** Table S12, CES 2022
sum prob_any age ibn.race_2 ibn.educ_2 if new_trans==1 & prob_any!=.

** Table S13
tab prob_any new_trans, col chi
tab prob_type new_trans, col chi

* odds ratios/relative-risk ratios (in-text)
logistic prob_any i.new_trans
mlogit prob_type i.new_trans, base(3) rrr

** Table S14

*** Model 30
logit prob_any i.new_trans

*** Model 31
logit prob_any i.new_trans log_age i.female i.race_2 i.educ_2


****FIN.
